A Comparative Study of Language Models for Book and Author Recognition

نویسندگان

  • Özlem Uzuner
  • Boris Katz
چکیده

Linguistic information can help improve evaluation of similarity between documents; however, the kind of linguistic information to be used depends on the task. In this paper, we show that distributions of syntactic structures capture the way works are written and accurately identify individual books more than 76% of the time. In comparison, baseline features, e.g., tfidf-weighted keywords, function words, etc., give an accuracy of at most 66%. However, testing the same features on authorship attribution shows that distributions of syntactic structures are less successful than function words on this task; syntactic structures vary even among the works of the same author whereas features such as function words are distributed more similarly among the works of an author and can more effectively capture authorship.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

Book Review: 'Ecolinguistics: Language and ecology'

Ecolinguistics: language and ecology delivers an overall view and a critical approach on ecolinguistic studies. This book is an excellent resource to students, researchers, linguists and those working in the area of discourse analysis as well as ecology. The book claims presenting a news course for ecolinguistics including a framework for understanding the theory of ecolinguistics, exploration ...

متن کامل

The Comparative Study of the Iranian EFL Learners Vocabulary Learning through Two Different Formats: Paper & Pencil vs. Software

This study aimed to investigate the effect of using vocabulary software on the vocabulary learning of Iranian EFL learners. Participants of the study were 54 intermediate-level students (23 males and 31 females) learning English as a foreign language in Mehr Institute in Izeh who were selected after taking the Nelson English Language Test as a proficiency test. They were randomly divided into t...

متن کامل

Book Review: "New Geographies of Language: Language Culture and Politics in Wales"

The book New Geographies of Language: Language, Culture and Politics in Wales is naturally seeking a very interesting goal rarely been witnessed before. For one thing, it is trying to mix language and linguistics with a totally distinct science, geography. For another, geography happens to be a literally exotic science. Students all around the world might be generally of two types: Those who lo...

متن کامل

Comparative Study of Nominalization in Applied Linguistics and Biology Books

This study explored nominalized expression types in an applied linguistics book and a biology book as 2 distinct disciplines. The books were carefully read, the nominalized expression types were identified, the frequencies of the nominalization types were counted, and eventually chi-square was administered. Results revealed no significant difference in using nominalization. Furthermore, the den...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005